Skip to content

engine: add input grace period and check pending chunks on shutdown #9952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 29, 2025

Conversation

singholt
Copy link
Contributor

@singholt singholt commented Feb 17, 2025

What does this PR do?

This PR makes the following changes:

  1. Add an input grace period:

Currently, Fluent Bit pauses all inputs 1 second after SIGTERM. This PR creates an input grace period, which by default is half the total "Grace" setting. This means that half way through the grace period Fluent Bit stops accepting any new logs and only sends logs pending in the buffers.

  1. Check pending chunks on shutdown:

Previously the engine shutdown immediately if there were no pending tasks. A task is created from a chunk in the buffer. If there is a new chunk, but no task yet, the engine should keep running until the task is created and completed. This change makes the engine wait on shutdown for all pending chunks until the max grace period has expired.

What use-case does this PR aim to solve?

In production environments with high-throughput logging, applications can generate significant volumes of logs even during the shutdown phase. Container orchestration services, such as Amazon ECS, provide containers with a (configurable) graceful shutdown period (default is 30 seconds in ECS) to properly terminate their operations. However, the current implementation may lead to dropped logs during this shutdown process, as it immediately stops accepting inputs after SIGTERM and may not process all buffered data.

By introducing an input grace period and improving the pending chunk verification, Fluent Bit can now better utilize the provided shutdown window - continuing to accept critical logs for a portion of the grace period while ensuring all buffered data is properly processed and delivered to their destinations. This results in more meaningful use of the shutdown time rather than simply discarding unprocessed input.

These improvements are also valuable when applications perform controlled shutdowns due to conditions like OOM or health check failures - capturing crucial diagnostic logs during the application's final moments.


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@singholt singholt changed the title [wip] do not review yet [wip] engine: add input grace period and check pending chunks on shutdown Feb 19, 2025
@singholt singholt marked this pull request as ready for review February 19, 2025 21:42
@singholt singholt changed the title [wip] engine: add input grace period and check pending chunks on shutdown engine: add input grace period and check pending chunks on shutdown Feb 19, 2025
@singholt
Copy link
Contributor Author

Hi @edsiper / @leonardo-albertovich could you please review this PR and provide your feedback? thanks!

Copy link
Member

@edsiper edsiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments added

@singholt singholt force-pushed the engine-grace-input branch from ccbcbf8 to 77d61d1 Compare May 19, 2025 16:25
singholt added a commit to singholt/fluent-bit-docs that referenced this pull request May 19, 2025
singholt added a commit to singholt/fluent-bit-docs that referenced this pull request May 19, 2025
@singholt
Copy link
Contributor Author

Docs PR: fluent/fluent-bit-docs#1667

@leonardo-albertovich
Copy link
Collaborator

I'll review this PR tomorrow.

@leonardo-albertovich leonardo-albertovich self-assigned this May 19, 2025
Signed-off-by: Anuj Singh <[email protected]>
Co-authored-by: Wesley Pettit <[email protected]>
@leonardo-albertovich
Copy link
Collaborator

Are you done making changes @singholt? I want to review this today but only if it's the final code.

@singholt
Copy link
Contributor Author

singholt commented May 20, 2025

Are you done making changes @singholt? I want to review this today but only if it's the final code.

Yes, just fixed the compilation error CI caught! Its ready for your review.

@singholt
Copy link
Contributor Author

@leonardo-albertovich PTAL

@cosmo0920 cosmo0920 requested review from cosmo0920 and removed request for fujimotos May 29, 2025 02:13
Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, there's no objections. It looks good. 👍

@edsiper edsiper merged commit 66950a5 into fluent:master May 29, 2025
67 of 72 checks passed
@edsiper
Copy link
Member

edsiper commented May 29, 2025

thanks folks!

I think the last step (if not ready yet) would be to submit the changes for the docs to describe the new storage.backlog.flush_on_shutdown config property

@singholt
Copy link
Contributor Author

thanks folks!

I think the last step (if not ready yet) would be to submit the changes for the docs to describe the new storage.backlog.flush_on_shutdown config property

Here: fluent/fluent-bit-docs#1667, its ready to be merged alongside release.

lecaros pushed a commit to fluent/fluent-bit-docs that referenced this pull request May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants